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Abstract. - We analyze the performance of a linear code used for a data compression of 
Slepian-Wolf type. In our framework, two correlated data are separately compressed into 
codewords employing Gallager-type codes and casted into a communication network through 
two independent input terminals. At the output terminal, the received codewords are jointly 
decoded by a practical algorithm based on the Thouless- Anderson-Palmer approach. Our 
analysis shows that the achievable rate region presented in the data compression theorem by 
Slepian and Wolf is described as first-order phase transitions among several phases. The typical 
performance of the practical decoder is also well evaluated by the replica method. 



The ever increasing information transmission in the modern world is based on network 
communications. While a lot of cutting edge technologies have been developed to realize the 
comfortable communication, few techniques are designed for network-based data transmission. 
It is quite strange that even the up-to-date information technologies are based on point-to- 
point protocols, although the global computer network called 'the Internet' already exists. 
Therefore, now is the time that we focus on the multi terminal communication techniques. 

Data compression, or source coding, is a scheme to reduce the size of message (data) in 
information representation. In his seminal paper Shannon showed that for an information 
source represented by a distribution 'P{S) of N dimensional Boolean (binary) vector S, one can 
employ another representation in which the message length N is reduced to M(< N) without 
any distortion, if the code rate R = M/N satisfies R> H2 (S) in the limit N, M —* 00. Here, 
H2{S) — —{l/N)Tis'P{S)\og2V{S) represents the binary entropy per bit in the original 

. , , representation S indicating the optimal compression rate. 

Unfortunately, Shannon's theorem itself is non-constructive and does not provide explicit 
^ ] rules for devising the optimal codes. Therefore, it is surprising that a practical code proposed 
■ - - ' by Lempel and Ziv (LZ) in 1973 [ p^ saturates the Shannon's optimal compression limit in 
the case of point-to-point communication. However, it should be emphasized here that gener- 
alization of the LZ codes to advanced data compression suitable for network communications 
(NC) is difficult although importance of the NC is rapidly increasing as recent development of 
the Internet. This is mainly because all the practical codes that saturate Shannon's limit to 
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(a) Slepian and Wolf system 



(b) Achievable rate region 



Fig. 1 - (a)Slepian and Wolf system: A simple communication network introduced in the data com- 
pression theorem of Slepian and Wolf. Separate coding is assumed in the distributed system, (b) 
Achievable rate region: Code rates are classified into four categories according to whether the two 
compressed data are decodable or not. The parameter regime where the both data are decodable 
without any distortion is termed the achievable rate region. 



date require a complete knowledge about all source vectors coming into the communication 
network while the compression should be carried out independently on each terminal in usual 
situations. Therefore, the quest for more efficient compression codes that are suitable for NC 
still remains one of the most important topics in information theory [0 . 

The purpose of this letter is to employ recent developments of the research on error- 
correcting codes (ECC) for this purpose. More specifically, we will investigate the efficacy 
and the limitation of a linear compression scheme inspired by Gallager's ECC ||], which has 
been actively investigates in both of information theory and physics communities |^|^, 
when it is applied to a data compression problem introduced by Slepian and Wolf (SW) in 
the research of NC [^0| . Unlike the existing argument in information theory, our approach 
based on statistical mechanics makes it possible not only to assess the theoretical bounds of 
the achievable performance but also to provide practical encoding/decoding methods that can 
be performed in linear time scales with respect to the data length. 

Let us start with setting up the framework of the SW problem 1 10 . In a general scenario, 
two correlated A^-dimensional Boolean vectors ^ and r/ are independently compressed to M- 
dimensional vectors x and y respectively. These compressed data (or codewords) x and y 
are decoded to retrieve the original data simultaneously by a single decoder. A schematic 
representation of this system is shown in Fig 



1(a) 



The codes used in this letter are composed of randomly selected sparse matrices A and B 
of dimensionality Mi x TV and M2 x A^, respectively. These are constructed similarly to those 
of Gallager's ECC [|j as being characterized by Ki and K2 nonzero unit elements per row 
and Ci and C2 nonzero unit elements per column, respectively. The compression rates can be 
different between the two terminals. Corresponding to matrices A and B, the rates are defined 
as i?i = Mi/N — Ki/Ci and R2 — M2/N — K2/C2, respectively. While both matrices are 
known to decoder, encoders only need to know their own matrix, that is, encoding carried out 
separately in this scheme as m = A£^ and v — Br/, where Boolean arithmetics are employed 
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to the Boolean vectors. After receiving the codewords u and v, the couple of equations 

u:^AS, v^Bt (1) 

should be solved with respect to S and r which become the estimates of the original data ^ 
and J7, respectively. 

To facilitate the current investigation we first map the problem to that of an Ising model 
with finite connectivity. We employ the binary representation (+1, —1) of the dynamical 
variables S and r and of the vectors u and v rather than the Boolean (0, 1) one; the vector u 
is generated by taking products of the relevant binary data bits u^i^ i^ ... — £,i^^i2 ■ ■ ■ ■Cia-i > 
where the indices ii, «2, • • • , iKi correspond to the nonzero elements of A, producing a binary 
version of u, and similarly for v. Assuming the thermodynamic limit N, Mi, M2 — > 00 
keeping the code rates Ri = Mi/N and R2 = M2/N finite is quite natural as communication 
to date generally requires transmitting large data, where finite size corrections are likely to 
be negligible. To explore the system's capabilities we examine the partition function 



{il,i2,--- ,iKi) 



X 

(il,i2,--- ,iK2) 



(2) 



The tensor product Ai^^^^^^.- .tKi)^{ti.t2.- .ik,)^ where "(^i^j^,--- ) = SnC^a • ' 'CiAi is the bi- 
nary equivalent of A^. Elements of the sparse connectivity tensor -4^^^ i^,- - ,4^^) take the 
value 1 if the corresponding indices of data are chosen (i.e., if all corresponding indices of 
the matrix A are 1) and otherwise; it has Ci unit elements per i index representing the 
system's degree of connectivity. Notice that if the product Si-^Si^ ■ ■ ■ Sij^ is in disagreement 
with the corresponding element ut^i-^^^i^^... ,1^^), which implies an error for the parity check, the 
value of the partition function Z vanishes. Similar arguments are valid for ^(ji.ia, -- ,*a-2> ^'^'^ 
^^{ii,i2,- - ,i/c2>- "^^^ probability 'P{S,t) represents our prior knowledge of data including the 
correlation between the sources ^ and rj. Note that the dynamical variables x, introduced to 
estimate 77, are irrelevant to the performance measure with respect to the other data ^. 

Since the partition function Eq. (^) is invariant under the transformations Si — > Si^i, — > 

TiVij '^{ii,i2,--- ,ifCi> ^ ^{il,i2,--- ■,iKi)^il^i2 ' ' ' CiA^ ~ ^ and W^j^ j^,--- ,1^2) ^ ^{ilM,--- ,4X2 ) ''"'1 ''"'2 ' ' ' ''"''^2 

= 1, it is useful to decouple the correlations between the vectors S, t and ^, r). Rewriting 
Eq. (H) using this gauge, one obtains a similar expression apart from the first factor which 
becomes 7^(5' ® |, r (g) 77), where S ® $, = {Si£,i) and t ®r) = {Tiirji) for i = 1, 2, . . . ,N. 

The random selection of elements in A and B indroduces disorder to the system; we average 
the logarithm of the partition function Z{A, B, u, v) over the disorder and the statistical 
properties of both data, using the replica method In the calculation, a set of order 



arise, 



parameters qa,i3.,-n = -kYld=i ZiSf S'^ ■ ■ ■ S] and Va^p^---,^ = :^ Lili ^i'^f '^f 
where ck,/3, • • • ,7 represent replica indices, and the variables Zi and Yi come from enforcing 
the restriction of Ci and C2 connections per index, respectively as in |^ . 

To proceed further, we have to make an assumption about the order parameters' symmetry. 
The assumption made here, and validated later on, is that of replica symmetry in the following 
representation of the order parameters and the related conjugate variables: 

9c(,/3,---,7 = a-q j dxTT{x)x\ 9q,/3,--- ,7 = ctg J dxTr{x)x', 

ra,f3,-,'y = Or / dy p{y)y\ r„,/3,... ,7 = a? / dyp{y)f, (3) 
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where I is the number of rephca indices and a* are normalization factors to make 7r(a;), 7r{x), 
p{y) and p{y) represent probabihty distributions. Unspecified integrals are carried out over 

the range [—1, +1]. 

Extremizing the averaged expression with respect to the probabihty distributions, we ob- 
tain the following free energy per spin: 

1 



= - Extr^ 



-Ci ( In 



^ /in ( i+n^\^» 
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1 + y/iiTi 



(4) 



where the brackets with the subscript tt and w represent averages over the probability distribu- 
tions Tr(x) and tt(x) with respect to variables denoted by x and x with and without subscripts, 
respectively. Similar notations are also used for p and p. The bracket with the subscript V 
denotes the average with respect to ^ and rf following the data distribution ■p(^, rj). 

Taking the functional derivative with respect to the distributions tt, tt , p and p, we obtain 
the following saddle point equations: 



1 ^ / 



i=l 



Ci-1 



a; - tanh I F,(x^jg£(^)/i,y^i;^,J7)^, + ^ tanh ^(a 



7t(x) 



/ / Ki-1 

(5 ( a; — a;i 
y V i=i 



(5) 



where the effective fields denoted by Fj with subscripts are implicitly defined as 

gFi{x^j^C(^,)/i,y^Li■,i,v)iiSi 



2 cosh Fi {Xf,j^c(p) /i-,yp.i\i, V) 



nti n 



(6) 



and similarly for p{y) and Notice that the notation S/Si represents the set of all 

dynamical variables S except Si. On the other hand, Ci{p) and C2{p) denote the set of all 
indices of nonzero components in the pih row of A and B, respectively. The notation £i{p)/i 
represents the set of all indices belonging to >Ci(/i) except i, and similarly for others. 
After solving these equations, the expectation of the overlap can be evaluated as 



m 



1 = ( X] ^^Sn {Si} j ^ j '^^ ^^^^ sign(^) 



(7) 
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where we denote thermal averages (■ ■ • ) and 



(^) 



1 ^ 
TV ^ 

i=l 



Ci 



tanh Fi{x^j(zc{^)/i,yf,i\$„'q% + ^tanh V 



(8) 



and similarly for m2 of the overlap between r] and its estimator. 

The performance of the current compression method can be measured by the vector m = 
(mi, 012). Hereafter, we use the term 'ferromagnetic' to specify the perfect retrieval, that is , 
mi = 1 (or m2 = 1), while the term 'paramagnetic' implies the distortion, that is, mi < 1 (or 
m2 < 1). For instance, a term such as 'ferromagnetic-paramagnetic phase' denotes the phase 
characterized by the performance vector m G {(mi, m2)|mi = 1, m2 < 1}, and so on. 

One can show that the ferromagnetic-ferromagnetic state (FF): n{x) — 5{x — 1), tt{x) = 
5{x — 1), p[y) ~ 5{y — 1) and pijj) = <5(y — 1) always satisfies Eq. (||). In addition, in the 
limit of Ci, C2 — > 00, four solutions describing the paramagnetic-paramagnetic state (PP): 
7r(a;) — 5{x), Ti{x) — (5(5;), p{y) — 5{y) and p(y) — S{y), the paramagnetic-ferromagnetic phase 
(PF): 7r(x) = S{x), Tr{x) = 6{x), p{y) = 5{'y — 1) and p{y) = S{y — 1) and the ferromagnetic- 
paramagnetic state (FP): it{x) = 5{x—l), n{x) — S{x—1), p{y) = 6{y) and p{y) — S{y) are also 
analytically obtained for an arbitrary ioint distribution 'P{$,,ri). Free energies corresponding 
to these solutions are provided from Eq. as 

Tff = Tr 7^(^,7,) In T'le 17), ^pp = (i?i + i?2) In 2, 

J'fp = i?2ln2-i-TrP(01n7'(0, ^pp - i?i ln2 - 1 Tr 7^(7,) In 7^(77), (9) 

iV C iV T? 

where subscripts stand for corresponding states and 'P(^) = Tr^ "P^, 77) and V{'q) = Tr^ 'P(^, 77) 
represent marginal distributions for the two source vectors ^ and 77, respectively. 

Perfect decoding is theoretically possible if J-pF is the lowest among the abov e four . The 
corresponding parameter regime termed achievable rate region is shown in Fig. 1(b) as an 
intersection of the inequalities 



Rl+R2>H2{tv), R1>H2{^\V), i?2 > i?2(77|0, 



(10) 



where i/2(«, 77) = -{l/N)Tr^,r,V{tr])\nV{tv), H2{^\v) = H2{tv)-H2{ri) &nd H^ivlO = 
H2{^i'n) ~ H2{C)- It is worthy of noticing that this coincides with the achievable rate region 
saturated by the optimal data compression in the current framework previously shown by 
SW [|l^. Namely, in the limit Ci, C2 — + 00, the current compression codes provide the optimal 
performance for arbitrary information sources V{^,ri). 

For finite Ci and C2, the saddle point equations (^) can be solved numerically; but the 
properties of the system highly depend on the source distribution V{^,ri), which makes it 
difficult to go further without any assumption on the distribution. As a simple but non-trivial 
example, we will focus here on a component-wise correlated joint distribution 



N 
i=l 



1 + miSi + m2n + qSiTi 



(11) 



where a set of parameters mi, TO2, and q characterize the data sources. To make Eq. ( |11| ) 
distribution, these parameters must satisfy four inequalities 1 + mi + m2 + g > 0, 1 — toi 
fn2 — 9 > 0, 1 -|- mi — 7712 — 9 > and 1 — mi — m2 -\- q>0. 
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Solving Eq. rigorously for decoding is computationally hard in general cases. However, 
one can construct a practical decoding algorithm based on the belief propagation (BP) or 
the Thouless- Anderson-Palmer (TAP) approach It has recently been shown that these 
two frameworks provide the same algorithm in the case of ECC This is also the case 
under the current context. For distribution ([TTI), the algorithm derived from the BP/TAP 
frameworks becomes as 



,1 



af_,t + mi + m2af,ibi + qbi ^ _ b^^ + m2 + miaibfj,, + qai 



777 — ■ ■ 777 



jeCi{p,)/i jeC2{p.)/i 



where we denote a^j = ta,iihJ2„(zMi{i)/fi tanh"^ ml^ and a.j = tanhX]^tgA4i(i) tanh"^ mj^^, and 
similarly for &'s. Here, A4i{i) and A42(*) indicate the set of all indices of nonzero components 
in the ith column of the sparse matrices A and B, respectively. Eq. (|l^) can be solved 
iteratively from the appropriate initial conditions. After obtaining a solution, approximated 
posterior means can be calculated for i = l,2,...,iVas 

1 , „ > ai + mi + m2ajfej + qbi ^ i \ 6i + ^2 + mia^b, + qUi 

" = TT 1 iTZ T' = ^'^'1 " TT 1 iTZ T' (1^) 

1 + miCi + m20i + qaibi 1 + miOi + m20,; + qaibi 

which provide an approximation to the Bayes-optimal estimators as — sign(m^) and rji = 
sign(m|), respectively. 

In order to investigate the efficacy of the current method for finite Ci and C2, we have 
numerically solved Eqs. (|) and (|l|) for Ki = K2 = and Ci = C2 = 3 {Ri = i?2 = 1/2), 
results of which are summarized in Fig. || Numerical results for the saddle point equation 
(^ were obtained by an iterative method using 10^ — 10^ bin models for each probability 
distribution. 10 — 10^ updates were sufficient for convergence in most cases. Similarly to the 
case of Ci, C2 — > 00, there can be four types of solutions corresponding to combinations of 
decoding success and failure on the two sources. The obtained phase diagram is quite similar 
to that for Ci, C2 — » 00. This implies that the current compression code theoretically has a 
good performance close to the optimal one that is saturated in the limit Ci , C2 — > 00 although 
the choice of Ci = C2 = 3 is far from such limit. 

However, this does not directly mean that the suggested performance can be obtained in 
practice. Since the variables are updated locally in the BP/TAP decoding algorithm ([l^), 
it may become difficult to find the thermodynamically dominant state when there appear 
suboptimal states which have large basins of attraction. This suggests that the practical 
performance for the perfect decoding is determined by the spinodal points of the suboptimal 
states similarly to the case of ECC [||. In order to confirm this conjecture, we have numeri- 
cally compared the practical limit of the perfect decoding obtained by the BP/TAP decoding 
algorithm (|lj) and the spinodal points of the non-FF solutions. These two results exhibit an 
excellent consistency supporting our conjecture. In the figure, the perfectly decodable region 
obtained by the BP/TAP algorithm for mi = 0.7 cases is indicated as the area surrounded 
by the spinodal points and the boundaries for the feasible region 1 -I- 0.7 — m2 — q = and 
1 + 0.7 + m2 — q~0. This looks narrow compared to the theoretical limit, which might provide 
a negative impression on the practical utility of this code. Nevertheless, we still consider that 
the current method may be practically useful because the size of information that can be 
represented by parameters in the region is not so small as what the area looks. 

In summary, we have developed an efficient method of data compression in a multi-terminal 
communication network, taking advantage of the sparse matrix based linear compression 
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Fig. 2 - Phase diagram for Ki = K2 = 6 and Ci — C2 = S code in the case of component- 
wise correlated information source This figure shows that the feasible region in m2 — q plane 
for mi — 0.7 is classified into three states. Phase boundaries obtained by numerical methods are 
indicated by o with errorbars (FF/PP and FF/PF) and O (PF/PP). These are close to those for 
Ki = K2 ^ 00, C2 = C2 ^ 00 (curves and the vertical line). Practically decodable limits of the 
TAP/BP algorithm obtained for A'' = 10* systems are indicated as •. These are well evaluated by the 
spinodal points of non-FF solutions (□ with errorbars). Inset: The practical limits are represented 
by the sizes of transmitted information. The horizontal and vertical axes show the entropy of the 
second source r and the joint entropy, respectively. 



codes. We observed several practical properties of the codes of this type in the simplest 
model of a data compression employed for a network communication proposed by SW. Study- 
ing the typical performance of the linear compression codes in a network, which complements 
the methods used in the information theory literature, is the first step towards understanding 
typical properties of the network based systems. 
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